Search CORE

169 research outputs found

Programming heterogeneous, accelerator-based multicore machines:current situation and main challenges

Author: Namyst Raymond
Publication venue: HAL CCSD
Publication date: 16/05/2011
Field of study

Invited TalkInternational audienc

INRIA a CCSD electronic archive server

An Efficient OpenMP Runtime System for Hierarchical Arch

Author: Broquedis François
Goglin Brice
Namyst Raymond
Thibault Samuel
Wacrenier Pierre-André
Publication venue
Publication date: 04/06/2007
Field of study

Exploiting the full computational power of always deeper hierarchical multiprocessor machines requires a very careful distribution of threads and data among the underlying non-uniform architecture. The emergence of multi-core chips and NUMA machines makes it important to minimize the number of remote memory accesses, to favor cache affinities, and to guarantee fast completion of synchronization steps. By using the BubbleSched platform as a threading backend for the GOMP OpenMP compiler, we are able to easily transpose affinities of thread teams into scheduling hints using abstractions called bubbles. We then propose a scheduling strategy suited to nested OpenMP parallelism. The resulting preliminary performance evaluations show an important improvement of the speedup on a typical NAS OpenMP benchmark application

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

An Efficient and Transparent Thread Migration Scheme in the PM2 Runtime System

Author: Antoniu Gabriel
Bougé Luc
Namyst Raymond
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/1999
Field of study

International audienceThis paper describes a new iso-address approach to the dynamic allocation of data in a multithreaded runtime system with thread migration capability. The system guarantees that the migrated threads and their associated static data are relocated exactly at the same virtual address on the destination nodes, so that no post-migration processing is needed to keep pointers valid. In the experiments reported, a thread can be migrated in less than 75μs

HAL-ENS-LYON

CiteSeerX

INRIA a CCSD electronic archive server

Hal-Diderot

EASYPAP: a Framework for Learning Parallel Programming

Author: Lasserre Alice
Namyst Raymond
Wacrenier Pierre-André
Publication venue: HAL CCSD
Publication date: 06/02/2020
Field of study

This paper presents EASYPAP, an easy-to-use programming environment designed to help students to learn parallel programming. EASYPAP features a wide range of 2D computation kernels that the students are invited to parallelize using Pthreads, OpenMP, OpenCL or MPI. Execution of kernels can be interactively visualized, and powerful monitoring tools allow students to observe both the scheduling of computations and the assignment of 2D tiles to threads/processes. By focusing on algorithms and data distribution, students can experiment with diverse code variants and tune multiple parameters, resulting in richer problem exploration and faster progress towards efficient solutions. We present selected lab assignments which illustrate how EASYPAP improves the way students explore parallel programming

INRIA a CCSD electronic archive server

HAL Descartes

Hal-Diderot

Efficient shared memory message passing for inter-VM communications

Author: Diakhaté François
Jourdren Hervé
Namyst Raymond
Pérache Marc
Publication venue: HAL CCSD
Publication date: 01/08/2008
Field of study

Thanks to recent advances in virtualization technologies, it is now possible to beneﬁt from the ﬂexibility brought by virtual machines at little cost in terms of CPU performance. However on HPC clusters some overheads remain which prevent widespread usage of virtualization. In this article, we tackle the issue of inter-VM MPI communications when VMs are located on the same physical machine. To achieve this we introduce a virtual device which provides a simple message passing API to the guest OS. This interface can then be used to implement an efficient MPI library for virtual machines. The use of a virtual device makes our solution easily portable across multiple guest operating systems since it only requires a small driver to be written for this device. We present an implementation based on Linux, the KVM hypervisor and Qemu as its userspace device emulator. Our implementation achieves near native performance in terms of MPI latency and bandwidth

Crossref

INRIA a CCSD electronic archive server

HAL-CEA

A unified runtime system for heterogeneous multicore architectures

Author: Augonnet Cédric
Namyst Raymond
Publication venue: HAL CCSD
Publication date: 26/08/2008
Field of study

International audienceApproaching the theoretical performance of heterogeneous multicore architectures, equipped with specialized accelerators, is a challenging issue. Unlike regular CPUs that can transparently access the whole global memory address range, accelerators usually embed local memory on which they perform all their computations using a specific instruction set. While many research efforts have been devoted to offloading parts of a program over such coprocessors, the real challenge is to find a programming model providing a unified view of all available computing units. In this paper, we present an original runtime system providing a high-level, unified execution model allowing seamless execution of tasks over the underlying heterogeneous hardware. The runtime is based on a hierarchical memory management facility and on a codelet scheduler. We demonstrate the efficiency of our solution with a LU decomposition for both homogeneous (3.8 speedup on 4 cores) and heterogeneous machines (95% efficiency). We also show that a "granularity aware" scheduling can improve execution time by 35%

INRIA a CCSD electronic archive server

SPAWN: An Iterative, Potentials-Based, Dynamic Scheduling and Partitioning Tool

Author: Colombet Laurent
Denoual Christophe
Namyst Raymond
Papin Jean-Charles
Publication venue: HAL CCSD
Publication date: 16/11/2015
Field of study

International audienceMany applications of physics modeling use regular meshes on which computations of highly variable cost can occur. Distributing the underlying cells over manycore architec-tures is a critical load balancing step that should increase the period until another step is required. Graph partitioning tools are known to be very effective for such problems, but they exhibit scalability problems as the number of cores and the number of cells increases. We introduce a dynamic task scheduling approach inspired by physical particles interactions. Our method allows cores to virtually move over a 2D/3D mesh of tasks and uses a Voronoi domain decomposition to balance workload among cores. Displacements of cores are the result of force computations using a carefully chosen pair potential. We evaluate our method against graph partitioning tools and existing task schedulers with a representative physical application, and demonstrate the relevance of our approach

INRIA a CCSD electronic archive server

HAL-CEA

A Multithreaded Runtime Environment with Thread Migration for HPF and C* Data-Parallel Compilers

Author: Bougé Luc
Hatcher Phil
Namyst Raymond
Pérez Christian
Publication venue: HAL CCSD
Publication date: 01/11/1998
Field of study

This paper studies the benefits of compiling data-parallel languages onto a multithreaded runtime environment providing dynamic thread migration facility. Each abstract process is mapped onto a thread, so that dynamic load balancing can be achieved by migrating threads among the processing nodes. We describe and evaluate an implementation of this idea in the Adaptor HPF and the UNH C* data-parallel compilers. We show that no deep modifications of the compilers are needed, and that the overhead of managing threads can be kept small. As an experimental validation, we report on an HPF implementation of the Gauss Partial Pivoting algorithm. We show that the initial BLOCK data distribution with our dynamic load balancing scheme can reach the performance of the optimal CYCLIC distribution

HAL-ENS-LYON

INRIA a CCSD electronic archive server

Hal-Diderot

NewMadeleine : ordonnancement et optimisation de schemas de communication haute performance.

Author: Aumage Olivier
Brunet Elisabeth
Namyst Raymond
Publication venue: 'Lavoisier'
Publication date: 01/01/2008
Field of study

National audienceMalgré les progrès spectaculaires accomplis par les interfaces de communication pour réseaux rapides ces quinze dernières années, de nombreuses optimisations potentielles échappent encore aux bibliothèques de communication. La faute en revient principalement à une conception focalisée sur la réduction à l'extrême du chemin critique afin de minimiser la latence. Dans cet article, nous présentons une nouvelle architecture de bibliothèque de communication bâtie autour d'un puissant moteur d'optimisation des transferts dont l'activité s'accorde avec celle des cartes réseau. Le code des stratégies d'optimisations est générique et portable, et il est paramétré à l'exécution par les capacités des pilotes réseau sous-jacents. La base de données des stratégies d'optimisation prédéfinies est facilement extensible. L'ordonnanceur est en outre capable de mixer de façon globalisée de multiples flux logiques sur une ou plusieurs cartes physiques, potentiellement de technologies différentes en multi-rail hétérogène

INRIA a CCSD electronic archive server

Improving Reactivity to I/O Events in Multithreaded Environments Using a Uniform, Scheduler-Centric API

Author: Bougé Luc
Danjean Vincent
Namyst Raymond
Publication venue: HAL CCSD
Publication date: 01/01/2002
Field of study

Reactivity to I/O events is a crucial factor for the performance of modern multithreaded distributed systems. In our scheduler-centric approach, an application detects I/O events by requesting a service from a detection server, through a simple, uniform API. We show that a good choice for this detection server is the thread scheduler. This approach simplifies application programming, significantly improves performance, and provides a much tighter control on reactivity

INRIA a CCSD electronic archive server